A Flexible Multitask Summarizer for Documents from Different Media, Domain, and Language

نویسنده

  • Maria Fuentes Fort
چکیده

Automatic Summarization is probably crucial with the increase of document generation. Particularly when retrieving, managing and processing information have become decisive tasks. However, one should not expect perfect systems able to substitute human summaries. The automatic summarization process strongly depends not only on the characteristics of the documents, but also on user different needs. Thus, several aspects have to be taken into account when designing an information system for summarizing, because, depending on the characteristics of the input documents and the desired results, several techniques can be applied. In order to support this process, the final goal of the thesis is to provide a flexible multitask summarizer architecture. This goal is decomposed in three main research purposes. First, to study the process of porting systems to different summarization tasks, processing documents in different languages, domains or media with the aim of designing a generic architecture to permit the easy addition of new tasks by reusing existent tools. Second, to develop prototypes for some tasks involving aspects related with the language, the media and the domain of the document or documents to be summarized as well as aspects related with the summary content: generic, novelty summaries, or summaries that give answer to a specific user need. Third, to create an evaluation framework to analyze the performance of several approaches in written news and scientific oral presentation domains, focusing mainly in its intrinsic evaluation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Different approaches for identifying important concepts in probabilistic biomedical text summarization

Automatic text summarization tools help users in the biomedical domain to acquire their intended information from various textual resources more efficiently. Some of biomedical text summarization systems put the basis of their sentence selection approach on the frequency of concepts extracted from the input text. However, it seems that exploring other measures rather than the raw frequency for ...

متن کامل

Quantifying the informativeness for biomedical literature summarization: An itemset mining method

OBJECTIVE Automatic text summarization tools can help users in the biomedical domain to access information efficiently from a large volume of scientific literature and other sources of text documents. In this paper, we propose a summarization method that combines itemset mining and domain knowledge to construct a concept-based model and to extract the main subtopics from an input document. Our ...

متن کامل

Interpersonal Metadiscourse in Newspaper Editorials

The power of media lies in its persuasive function, which gives media a potential to maneuver on the mind of audience (van Dijk 1996). This potential is realized via different linguistic resources, one important group of which is metadiscoursal resources. The major aim of this study was to explore how and in what distribution these resources are employed by writers with different cultural backg...

متن کامل

ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متن‌کاوی در حوزه یادگیری الکترونیکی

As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...

متن کامل

ایجاز:یک سامانه عملیاتی برای خلاصه‌سازی تک‌سندی متون خبری فارسی

The rapid growth of published documents on the web has created some new requests for processing, classification and information retrieval. So, the use of natural language processing tools has increased around the world. Automatic summarization known as the core of a wide range of text-processing tools such as decision systems, accountability systems, search engines, etc. And always has been inv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008